Weak factor automata: the failure of failure factor oracles?
نویسندگان
چکیده
In indexing of, and pattern matching on, DNA and text sequences, it is often important to represent all factors of a sequence. One efficient, compact representation is the factor oracle (FO). At the same time, any classical deterministic finite automaton (DFA) can be transformed to a so-called failure one (FDFA), which may use failure transitions to replace multiple symbol transitions, potentially yielding a more compact representation. We combine the two ideas and directly construct a failure factor oracle (FFO) from a given sequence, in contrast to ex post facto transformation to an FDFA. The algorithm is suitable for both short and long sequences. We empirically compared the resulting FFOs and FOs on number of transitions for many DNA sequences of lengths 4 − 512, showing gains of up to 10% in total number of transitions, with failure transitions also taking up less space than symbol transitions. The resulting FFOs can be used for indexing, as well as in a variant of the FO-using backward oracle matching algorithm. We discuss and classify this pattern matching algorithm in terms of the keyword pattern matching taxonomies of Watson, Cleophas and Zwaan. We also empirically compared the use of FOs and FFOs in such backward reading pattern matching algorithms, using both DNA and natural language (English) data sets. The results indicate that the decrease in pattern matching performance of an algorithm using an FFO instead of an FO may outweigh the gain in representation space by using an FFO instead of an FO.
منابع مشابه
Weak Factor Automata: Comparing (Failure) Oracles and Storacles
The factor oracle [3] is a data structure for weak factor recognition. It is a deterministic finite automaton (DFA) built on a string p of length m that is acyclic, recognizes at least all factors of p, has m+1 states which are all final, is homogeneous, and has m to 2m − 1 transitions. The factor storacle [6] is an alternative automaton that satisfies the same properties, except that its numbe...
متن کاملConstructing Factor Oracles
A factor oracle is a data structure for weak factor recognition. It is an automaton built on a string p of length m that is acyclic, recognizes at least all factors of p, has m+1 states which are all final, and has m to 2m−1 transitions. In this paper, we give two alternative algorithms for its construction and prove the constructed automata to be equivalent to the automata constructed by the a...
متن کاملError analysis of factor oracles
Factor oracles [1] constructed from a given text are deterministic acyclic automata accepting all substrings of the text. Factor oracles are more space economical and easy to implement than similar data structures such as suffix tree[6]. There is, however, some drawback; a factor oracle may accept strings not in the text, which we call a error acceptance. In this paper, we charactrize factor or...
متن کاملLaboratory Model Tests to Study the Behavior of Soil Wall Reinforced by Weak Reinforcing Layers
In this paper we suggest a method to calculate the first integrals of a special system of the first order of differential equations. Then we use the method for finding the solutions of some differential equations such as, the differential equation of RLC circuit
متن کاملMulti-factor failure mode critically analysis using TOPSIS
The paper presents a multi-factor decision-making approach for prioritizing failure modes as an alternative to traditional approach of failure mode and effect analysis (FMEA). The approach is based on the ‘technique for order preference by similarity to ideal solution’ (TOPSIS). The priority ranking is formulated on the basis of six parameters (failure occurrence, non-detection, maintainability...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- South African Computer Journal
دوره 53 شماره
صفحات -
تاریخ انتشار 2014